Fix race condition in GC by Besroy · Pull Request #402 · eBay/HomeObject

Besroy · 2026-03-23T08:05:36Z

When GC resets move_to_chunk via purge_reserved_chunk(), stale repl_reqs may still exist and be cleaned up by background gc_repl_reqs(). This causes two race conditions:

Stale rreq frees blk on NEW allocator after reset (wrong allocator)
Stale rreq frees blk on OLD allocator during reset, accessing destroyed superblock and causing crash

Issue link: #401

codecov-commenter · 2026-03-23T08:47:37Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 54.20%. Comparing base (1746bcc) to head (66c61c2).
⚠️ Report is 167 commits behind head on main.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #402      +/-   ##
==========================================
- Coverage   63.15%   54.20%   -8.96%     
==========================================
  Files          32       36       +4     
  Lines        1900     5267    +3367     
  Branches      204      656     +452     
==========================================
+ Hits         1200     2855    +1655     
- Misses        600     2114    +1514     
- Partials      100      298     +198

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

xiaoxichen

lgtm

xiaoxichen · 2026-03-23T10:12:46Z

src/lib/homestore_backend/gc_manager.cpp

+    // NOTE: This introduces potential double-free risk with gc_repl_reqs() background thread.
+    // See https://github.com/eBay/HomeObject/issues/401.
+    auto hs_pg = m_hs_home_object->get_hs_pg(pg_id);
+    hs_pg->repl_dev_->clear_chunk_req(move_to_chunk);


do we need similar for move_from_chunk

pls move this change into purge_reserved_chunk and before vchunk->reset()

Moved. Also changed chunk to vchunk->get_chunk_id() (IIUC this also returns the pchunk id, please correct me if I'm wrong).

JacksonYao287

is it possible that a new rreq is created after clear_chunk_req and before vchunk->reset(), and still hit the same issue?

JacksonYao287 · 2026-03-23T12:43:53Z

src/lib/homestore_backend/gc_manager.cpp

+    // NOTE: This introduces potential double-free risk with gc_repl_reqs() background thread.
+    // See https://github.com/eBay/HomeObject/issues/401.
+    auto hs_pg = m_hs_home_object->get_hs_pg(pg_id);
+    hs_pg->repl_dev_->clear_chunk_req(move_to_chunk);


pls move this change into purge_reserved_chunk and before vchunk->reset()

Besroy · 2026-03-24T02:47:42Z

is it possible that a new rreq is created after clear_chunk_req and before vchunk->reset(), and still hit the same issue?

Good question. Since the chunk is already a move_to_chunk, meaning it is in the m_reserved_chunk_queue, will there still be new rreq on this chunk? If I understand correctly, there are two types of write IO on the chunk: put and delete. For a put, since there is no open shard on the chunk, it will first create one — the create operation will be blocked at select_specific_chunk due to chunk_state=GC. For delete, it does not occupy space, so there will be no blk distribution. If that's the case, currently there is no risk.

However, it would be better if you can also review the blob path logic while sorting out the shard process to see if there are other potential issues with GC concurrency.

When GC resets move_to_chunk via purge_reserved_chunk(), stale repl_reqs may still exist and be cleaned up by background gc_repl_reqs(). This causes two race conditions: 1. Stale rreq frees blk on NEW allocator after reset (wrong allocator) 2. Stale rreq frees blk on OLD allocator during reset, accessing destroyed superblock and causing crash

Besroy requested review from JacksonYao287 and xiaoxichen March 23, 2026 08:05

xiaoxichen reviewed Mar 23, 2026

View reviewed changes

JacksonYao287 reviewed Mar 23, 2026

View reviewed changes

Besroy force-pushed the fix_crash branch 3 times, most recently from 277fbe8 to e7df539 Compare March 24, 2026 03:23

Besroy force-pushed the fix_crash branch from e7df539 to 66c61c2 Compare March 24, 2026 03:27

xiaoxichen approved these changes Mar 24, 2026

View reviewed changes

Besroy merged commit 5cb5144 into eBay:main Mar 24, 2026
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race condition in GC#402

Fix race condition in GC#402
Besroy merged 1 commit intoeBay:mainfrom
Besroy:fix_crash

Besroy commented Mar 23, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Mar 23, 2026 •

edited

Loading

Uh oh!

xiaoxichen left a comment

Uh oh!

xiaoxichen Mar 23, 2026

Uh oh!

JacksonYao287 Mar 23, 2026

Uh oh!

Besroy Mar 24, 2026

Uh oh!

JacksonYao287 left a comment •

edited

Loading

Uh oh!

JacksonYao287 Mar 23, 2026

Uh oh!

Besroy commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Besroy commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

xiaoxichen left a comment

Choose a reason for hiding this comment

Uh oh!

xiaoxichen Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

JacksonYao287 Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Besroy Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

JacksonYao287 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JacksonYao287 Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Besroy commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Besroy commented Mar 23, 2026 •

edited

Loading

codecov-commenter commented Mar 23, 2026 •

edited

Loading

JacksonYao287 left a comment •

edited

Loading